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Abstract —The data center network (DCN), wired or wireless, 
features large amounts of Many-to-One (M20) sessions. Each 
M20 session is currently operated based on Point-to-Polnt (P2P) 
communications and Store-and-Forward (SAF) relays, and is 
generally followed by certain further computation at the desti¬ 
nation. Different from this separate P2P/SAF-based-transmission 
and computation strategy, this paper proposes STAC, a novel 
physical layer scheme that achieves Simultaneous Transmission 
and Air Computation in wireless DCNs. In particular, STAC takes 
advantage of the superposition nature of electromagnetic (EM) 
waves, and allows multiple transmitters to transmit in the same 
time slot with appropriately chosen parameters, such that the 
received superimposed signal can be directly transformed to the 
needed summation at the receiver. Exploiting the static channel 
environment and compact space in DCN, we propose an enhanced 
Software Defined Network (SDN) architecture to enable STAC, 
where wired connections are established to provide the wireless 
transceivers external reference signals. Theoretical analysis and 
simulation show that with STAC used, both the bandwidth and 
energy efficiencies can be improved severalfold. 

I. Introduction 

A modern Data Center (DC) typically consists of a large 
dedicated cluster of commercial computers (work nodes) that 
are housed together to store/process big files in a parallel man¬ 
ner. The characteristic of parallel storing/processing requires 
frequent communications among the work nodes, which are 
accomplished through Data Center Networks (DCNs). Today, 
DCN is the principle bottleneck in large DCs ||T]. Despite 
of its maturity in deployment and high bandwidth, the wired 
DCN has a few critical problems such as flexibility, cabling 
complexity, device cost, over subscription, etc. These problems 
highly limit the efficiency and scalability of the DCN and are 
being exacerbated provided that a huge amount of information 
needs to be stored/processed within the DC and exchanged 
through the DCN in today’s big data age. 

To address this issue, some works 0-0 studied the pos¬ 
sibility of constructing wireless DCNs using high frequency 
electromagnetic (EM) waves. The 60 GHz techniques were 
suggested for realizing wireless DCN links with bandwidth 
comparable to wireline connections Q, |[^, while the block¬ 
age and directivity problems associated with the EM waves 
can be signihcantly mitigated by utilizing the strategies of 
ceiling reflection and 3D beamforming Q. Eree-space opti¬ 
cal DCN communications were also investigated, and were 



Fig. 1. Illustration of one M20 session. Source nodes (solid circles) transmit 
information to destination d via relay nodes (hollow circles). 


shown to achieve some further improvements, including higher 
bandwidth and nearly perfect directivity l]^. On the other 
hand, from the structural perspective, the work Q considered 
augmenting the wired DCN with added wireless flyways, 
and Q demonstrated that a completely wireless DCN with 
a Cayley structure is feasible and performs even better than 
the wired DCN. 

Wireless DCN is different from today’s ubiquitous wireless 
networks, through traffic patterns to network structures. These 
differences can provide new challenges, as well as possibili¬ 
ties, to design more efficient wireless DCNs. One particular 
challenge in DC is the large amounts of Many-to-One (M20) 
sessions, which brings some new problems for DCNs, espe¬ 
cially with the Point-to-Point (P2P) communication and the 
Store-and-Eorward (SAE) relay strategies. The M20 sessions 
arise from various DC applications, e.g., Google Eile System 
(GES) ® and MapReduce Q framework. Due to the limited 
transmission range of high frequency EM waves, these M20 
sessions are operated through multi-hopping over hierarchical 
multiple-access units as shown in Eig. [T] where each hop is 
based on the P2P communication and followed by the SAP 
relay to the next hop. Specifically, in the multiple-access unit 
as depicted, with Time Division Multiple-Access (TDMA), the 
source nodes 1, 2,..., AT successively transmit their informa¬ 
tion digits to the relay node 0 in different time slots with P2P 




strategy, and the relay stores all its received digits in the buffer 
before forwarding them to the destination d. Since the node 
O’s buffer and input/output bandwidth are shared by all the 
K source nodes, the transmission performance could be poor, 
especially when K is large. The nearer to the destination, the 
severer this problem will be, as the information that needs to 
be transmitted accumulates along the way. In fact, the problem 
of TCP throughput collapse caused by M20 transmissions in 
data center networks have been noted as incast problem pO) . 

A. A New Scheme: STAC 

Rather than regarding the traffic of M20 feature as a 
nuisance, we propose a new physical layer scheme, dubbed 
STAC (Simultaneous Transmission and Air Computation), to 
take advantages of the superposition nature of EM waves and 
the M20 transmissions. Our STAC are based on two key 
observations on the distinguishing features of wireless DCNs. 

Observation 1: One feature in DCs is that these M20 
sessions are generally followed by certain further computa¬ 
tions at the destination nodes. These computations normally 
satisfy the commutative and associative operational laws, with 
weighted summation being the typical case (e.g., in linear 
network coded storage E) and MapReduce-based machine 
learning ng applications). This opens up the possibility of 
dividing a whole computation task into several sub-tasks that 
can be conducted at the intermediate relay nodes, rather than 
demanding the final destination do all the jobs. In other words, 
instead of forwarding all the received digits, the relay could 
perform some intermediate computation and then forward only 
the output of the computation, thereby utilizing the bandwidth 
more efficiently Considering that the bottleneck of the 
development of DCs lies in the DCN, not the compute capabil¬ 
ities of the works nodes, we believe that such Compute-and- 
Forward (CAP) relay strategy is preferable to the traditional 
SAP strategy for DCNs. 

Observation 2: Another feature in DCs is that the static 
closed environment, where all the work nodes are closely 
placed in one relatively small rooms. As a result, the 
transceiver positions and the channel between them are time 
invariant. Moreover, with the indirect ceiling-reflection and the 
60GHz techniques Q, the channel between transceivers are 
indirect Line of Sight (LoS) channel without multi-path effect. 
These two facts help to easy the cooperative transmissions 
among the nodes. 

With respect to Observation 1, it suffices to illustrate STAC 
for a particular multiple-access unit as depicted in Fig. 
Suppose that the receive node 0 is only interested in the 
weighted summation sq of the K source digits si, S 2 , ■ • ■, sk, 

K 

So = '^WiS^, ( 1 ) 

where wi,W 2 , ■ ■ ■, wk are the weight coefficients, and all the 
quantities here are assumed to be real integers throughout this 

* This can be regarded as a simple extension of the combiner operation from 
the source node to the relay nodes. 


paper. In STAC, the K source nodes transmit their digits in the 
same time slot with appropriately chosen transmit powers, fre¬ 
quencies, phases and times, such that their information bearing 
EM waves arrive at node 0 in a desired superimposed form that 
can be transformed to sq directly. As will be shown, this new 
STAC scheme significantly improves the separate P2P/SAF- 
based-transmission and computation strategy, in terms of band¬ 
width and energy efficiencies. Additionally, in the general 
case when node 0 needs to fully recover the original K 
source digits, e.g., for performing some computation other than 
weighted summation, one can still apply STAC by properly 
designing a set of pseudo coefficients {wi,W 2 ,..., wk} such 
that the original digits si, S 2 , ■ ■ ■, Sff can be extracted from 
the received Sq. 

To enable STAC, accurate channel state information 
(CSI) and perfect frequency/time synchronization among the 
transceivers are needed, both of which may be difficult to 
obtain in general wireless networks. Thanks to Observation 
2, however, the CSI in a DC is nearly time-invariant and can 
be accurately estimated. 

To accomplish the synchronization, as another contribution, 
this paper novelly proposes to use wired connections among all 
the work nodes to provide the wireless transceivers external 
reference signals (e.g., a high quality external clock signal) 

d, based on an enhanced Software Defined Network (SDN) 
architecture HD- It should be pointed out that, the wired 
connections here are distinguished from the information trans¬ 
mission links in a wired DCN. The former are dedicated and 
solely responsible for control signals, not requiring the high 
bandwidth and random traffics as in the latter, and thus will 
not cause the aforementioned problems encountered by wired 
DCNs. We also remark that to build up such a wired control 
network in DCs is plausible considering that the work nodes 
are usually compactly piled up in a dedicated room of limited 
size. As a by-product, it will also reduce the DCN operation 
cost by eliminating the need of using individual oscillators at 
the transceivers. 

II. Motivating Examples 

Two major DC applications are i) distributed file storage, 

e. g., GFS Q and Hadoop Distributed File System (HDFS) 
og, and ii) parallel big data processing, typically based on 
the MapReduce style framework Q. We now present three 
detailed DC application examples mentioned in Section 1 that 
motivate our STAC scheme, where the first two correspond to 
GFS and MapReduce, respectively, and the last one shows the 
flexibility of STAC for general applications. Again, with task 
division, we can concentrate our discussions on the multiple- 
access unit depicted in Fig. 

Network Coded Storage. Due to the nonnegligible node 
failures in a DC |[g, in distributed storage systems, a big file 
is usually divided into many fixed-length data blocks that are 
further protected by multiple replicas stored at different work 
nodes. 

For storage efficiency, network code (or erasure code) can 
be applied pT), p6), p7), where each node stores the network 


coded data blocks rather than their original forms. When a data 
block is lost due to the node failure, it can be reconstructed at 
a new node by performing the following algorithm digit-by- 
digit: 


Algorithm 1 Network Coded Recovery 

1: So = WiSi 

2 : So ^ So mod 2'^ 


where so denotes a digit from the lost data block requiring 
recovery, si, S 2 ,..., s^f are digits from the data blocks stored 
at the other nodes, wi,W 2 , ■ ■ ■ ,wk are the network coding 
coefficients, and the modulo operation is due to the finite field 
size 2'J. Clearly, with STAC, we can achieve Step 1 of the 
algorithm directly. 

MapReduce Based Data Processing. Popularized by 
Google, MapReduce is a dominant parallel big data processing 
tool in DCs. In MapReduce model, when the map nodes 
finish the processing, their outputs with the same key will 
be sent to a specified reduce node for the final computations. 
Such computations are also typically in the form of weighted 
summations e.g., for all machine leaning algorithms 

fitting the statistical query model & scientific processes 
|Tg, 1^, parallel AT-means gT), prefix sum and brute- 
force sorting | |22) , documents similarity comparisons | |23) , 
etc. Again, our STAC scheme can be applied to achieve the 
simultaneous transmissions and computations efficiently. 

General Case. In DCs, there are quite a few other applica¬ 
tions, in which the additional task division does not applied. 
In such cases the receive node 0 needs the original source 
digits, one can appropriately design a set of pseudo coefficients 
{wi,W 2 ,. •., wk} such that the source digits si, S 2 , ■ ■ ■, Sk 
can be extracted from sq- In particular, suppose for each 
i = 1, 2,..., AT, 0 < Si < 2^ — 1, then choosing Wi = 
yields 

so = ^2«(*-i)si, 

i=l 

based on which all the source digits can be extracted with the 
following algorithm: 


Algorithm 2 Source Digits Extraction 
1: i ^ 1 

2 : while i < AT do 

3: Si 3— So mod 2'^ 

4: So 3— (so — Si)l2‘^ 

5: J 3— i -b 1 

6 : end while 


III. System Framework with STAC 
A. A Basic STAC Unit 

STAC is a general physical layer scheme that can be applied 
to wireless DCs with any structure, carrier frequency, etc. 



Fig. 2. A typical wireless DC layout. 


For illustration, consider a typical layout of the wireless 
DC as shown in Fig. where each rack contains multiple 
work nodes and has an antenna array mounted on its top to 
communicate with other racks (communications within a rack 
are accomplished with intra-rack connections) ||7|. As in 0, 
ceiling-reflecting and 3D beamforming techniques are adopted 
to achieve an indirect FoS link between any two antenna arrays 
without causing interference to others. 

Suppose AT work nodes (in K different racks) need to 
transmit their digits Si, S 2 ,..., srr to node 0 for computing 
the weighted summation as in Q. The operating principle of 
STAC is illustrated in the following. 

Each source node i maps its digit Si to a baseband modu¬ 
lated complex symbol di, and then up converts the symbol di 
to a passband signal given by 

where 6i and are the pre-equalizing phase and amplitude 
coefficients, respectively. Suppose each node i transmits at 
time ti using 3D beamforming, then the received passband 
signal y{t) can be expressed as 

K 

d^{t -ti - -b n{t) 

where hiC^^' is the equivalent complex channel coefficient 
from node i to 0, Ti is the propagation delay for node i, and 
n{t) is a Gaussian noise of variance for both the real and 
imaginary dimensions. With accurate CSI, one can set 

e[ = 0i and C=to- n, (2) 

such that the received signal simplifies to 

K 

y{t) = h^s/~Pdi{t - -b n(f), 

i=l 

which, after down conversion and sampling at time t = to, 
yields the baseband symbo|^ 

K 

y = ^his/PA + n. (3) 

i=l 

^The hi in are real variables, so that the real and imaginary parts of 
symbol y can be separated. The sequel of this paper will only consider the 
real part for simplicity. 
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Fig. 3. An enhanced SDN architecture. 


Clearly, if each node i sets 

P, = {w,lh,f, (4) 

then after eliminating the noise, node 0 can construct the 
desired digit sq as in Q from the symbol y in Q. 

With the above described principle, we can find that the 
time/frequeny synchronization and pre-equalization, such as 
© and are essential for our STAC. They can be realized 
based on an enhanced SDN architecture as shown in the next. 

B. An Enhanced SDN Architecture 

The DC generally works in a centralized control manner, 
where the front servers, including the job scheduler and data 
manager, manage all the work nodes. In current DCNs, control 
signals and data traffic share the same network. Here, we 
propose to use a dedicated low bandwidth wired control 
network with an added network server as shown in Fig. 
based on an enhanced SDN architecture. As mentioned in 
Section 1, the feasibility of establishing the wired control 
network is endorsed by the limited DC size and the fixed node 
locations. 

Our SDN architecture is an enhanced one in the sense that, 
it not only accomplishes networking control as in general 
SDNs, but also also provides the wireless transceivers the 
physical and upper layer configurations to enable STAC, 
including the synchronization information, the physical layer 
parameters such as powers, frequencies, phases and times, and 
the scheduling/routing information. 

Synchronization with External Reference Signals. Ex¬ 
ternal reference signals are provided to all the transceivers 
for synchronization. These include a high quality external 
clock signal, with which individual crystal oscillators at the 
transceivers are no longer needed and the operation cost 
can be thereby reduced. These reference signals can also 
help calibrate the wireless transceivers, e.g., reduce the errors 
induced from the device hardware differences d. 

Physical Layer Parameters. The network server maintains 
a connection information table that stores important physical 
layer parameters for each connection, such as the transmission 
delay r, channel coefficient he~^^ and the steering vectors 
required for 3D beamforming. When a transceiver fails (or a 


new one comes in), it informs the network server through the 
control network to remove (add) it from (to) the connection 
information table. 

Scheduling/Routing. Also maintained by the network 
server is a table storing the scheduling/routing information. 
When a current task finishes or a new one needs to start, 
the job scheduler informs the network server to update the 
scheduling/routing information table, and then the network 
server will do the corresponding coordinations among all the 
work nodes involved. 

IV. Physical Layer Issues 
A. Modulation-Demodulation Mapping 

The modulation for STAC is the same as that for P2P 
channels. However, their demodulation mappings are subtly 
different; STAC demodulation maps a superimposed symbol, 
which may even not belong to the transmit symbol sets, 
to the summation of the digits, whereas the P2P channel 
demodulation maps a particular symbol from the transmit 
symbol set to the corresponding digit. 

STAC Modulation. Specifically, writing node Fs digit Si 
into the bit sequence form yields 

[si(l),s,(2),... ,Si(0, ■ ■ ■ ,Si(T)] 
where Si{l) is the l-ih bit, L is the sequence length, and 

L-l 

s^ = Y^ 2^Si(l). 

1=0 

For modulation, assume BPSK (Binary Phase Shift Keying^ 
without error correction coding throughout this paper. At node 
i, each bit Si{l) is modulated to a symbol di{l) G { — 1,-|-1} 
as di{l) = 1 — 2 X Si{l). 

STAC Demodulation. After the Z-th transmission and the 
removal of noise with signal detection, the received superim¬ 
posed symbol can be written as 

K 

y{l) = '^h,y/Pdi{l) (5) 

By setting the transmit powetj^P^ = {wi/hiY, one has 

K 

y{l) ='^Widi{l), (6) 

i=l 

which, through the operation 


^STAC also applies with other modulations such as QPSK, QAM, OOK, 
OFDM, etc. This paper only considers the simplest BPSK due to the same 
reason mentioned in Footnote 1. 

"^With the unit power of di in BPSK, the transmit power Pi\di\^ simply 
equals Pi. 





























yields the summation Finally, the desired digit 

can be constructed as 

L-l K K L-l K 

^ 2 ' ^ WiSi{l) 2'‘Si{l) = ^ w^Si. 

l—O i—1 i—1 l—O i—1 

B. Signal Detection 

We now present a simple signal detection scheme for remov¬ 
ing the noise in (|^ to obtain (|^, and analyze its corresponding 
SER (Symbol Error Rate). It suffices to consider only one 
of the L transmissions, and hence the index I as in the last 
subsection will be omitted. 

Specifically, view the symbol tfidi in as a point 

of a non-standard PAM (Pulse Amplitude Modulation) con¬ 
stellation that results from the weighted superposition of the 
transmit BPSK constellations and hence may have unequal 
distance between different adjacent constellation points. A 
simple detection scheme is to quantize the y in ([^ to its nearest 
constellation point. Let tt be a permutation on {1, 2,..., AT} 
such that iUttOi) < '^Tr(j 2 )^ ^ D- We have the following 

theorem regarding the SER with such detection. 

Theorem 1: The SER with the nearest point detection is 
upper bounded by 

SERstac < (1 - l/2^)er/c(l/y2fT) (7) 

where erfc{x) = dt is the complementary error 

function, is the variance of the noise, and the equality in 
(0 holds when the distance between any two adjacent constel¬ 
lation points is equal to 2, e.g., when W 7 r(j) = 2-^“^ or 1, Vj = 

Proof Sketch: Since Wi are all real integers, the largest 
SER is attained when the distance between any two adjacent 
constellation points is 2, which includes the case of ti' 7 r(_,) = 
2^-1 or l,Vj = 

C. Perfonnance of STAC 

The performance of STAC is a tradeoff among SER, energy 
efficiency and bandwidth efficiency, and is clearly dependent 
of the weight coefficients. The air computation essence of 
STAC and its advantage over the separate strategy can be best 
illustrated in the ideal case of wi = W 2 = ■■■ = wk = 1, 
where we will show that for fixed energy efficiency, STAC 
achieves better SER and significantly improved bandwidth 
efficiency. 

On the other hand, to show that STAC uniformly out¬ 
performs the separate strategy, we will consider the pseudo 
coefficients case as mentioned in Section 2, i.e., WttO) = 
2^“^,Vj. The argument here is that by applying STAC with 
the pseudo coefficients, one can recover the original K source 
digits, based on which summation with any weight coefficients 
can be computed. We will show that in this case, STAC 
achieves better energy efficiency for fixed SER and bandwidth 
efficiency. 


1) The Ideal Case: Suppose Wi = W 2 = ■ ■ ■ = wk = 1, 
which is the ideal case for STAC. The SER of STAC is given 
in Theorem [T] i.e., 

SERs™c = (l-l/2'^)er/c(^). 

V 2(7 

Note that in this case, the resultant receive PAM constellation 
has only K + 1, instead of 2^, points, where the decrease 
of the constellation size is due to the “air computation”. 
Or equivalently, viewed from the energy perspective, this 
advantage is reflected by the fact that the needed transmit 
power now attains the minimum Pi = l/hf for each node 
i. 

Eor the separate strategy, assume each node i transmits with 
the same power = l//if as in STAC. The SER for node i 
is a standard result, given by ^erfcj^^). Combining all the 

detected K symbols, the receiver computes and 

the resultant SERsep is characterized in the following theorem. 

Theorem 2: The SER with the separate strategy is given 
lower bounded by 

11/ 1 
SERehp> --- (^l-erfc(^)j . 

where the equality achieves when wi = W 2 = ■ ■ ■ = wk = 1- 

Proof Sketch; The theorem can be proved by noting that the 
number of erroneous symbols is a binomial random variable 
with parameters {K,p), and the computation result is wrong if 
and only if there are odd number of erroneous symbols when 
Wi = W2 = ■ ■ ■ = Wk = 1 - 

Theorem 3: SERsep > SERstac for any K >2. 

Proof Sketch: Use mathematical induction. 

Therefore, STAC achieves a better SER and simultaneously 
improves the bandwidth efficiency by a factor of K. Espe¬ 
cially, note that as AT —>■ (X), SERstac —>■ &cfc{l/s/2a) whereas 
SERsep —^1/2. 

To achieve the same bandwidth efficiency, suppose each 
node transmits K bits in one symbol for separated transmis¬ 
sion. Then each node need to increase its transmit power at 
least by a factor 2^, resulting an SER more than SERsep. In 
other words, STAC can improve the energy efficiency by a 
factor more than 2^ in the ideal case. 

2) Pseudo Coefficients Case: Consider a set of pseudo 

coefficients tu,rO) = To minimize the total transmit 

power STAC, we allocate these coeffi¬ 
cients among the AT nodes such that /,r(ji) ^ ^7i — D- 

Assuming STAC is completed within unit time, the total 
transmit energy Estac is given by 

K 

= ( 8 ) 

i=i 

We now calculate the total energy needed Asep for the 
separate strategy assuming that each node transmits 1 bit to 
the receiver within 1 /AT time to maintain the same bandwidth 
efficiency as STAC. For the separate strategy to achieve the 
similar SER as STAC, the distance between any adjacent 




receive constellation points also needs to be 2, in which case 
node I’s transmit power is given by 

i=i 

Therefore, the total energy needed is 

( 9 ) 

i=i j=i 

where the factor \/K accounts for the transmission time of 
each node. 

Theorem 4: Ese? > Pstac, where the equality holds only 
when hi are the same for all i. 

Proof Sketch: The proof utilizes the important fact that 

^ ^7r(j2)’ — 32 - 

From Theorem it can be concluded that STAC performs 
uniformly better than the separate strategy for any set of 
weight coefficients. This is because even requiring STAC to 
fully recover the original K source digits leads to better 
energy efficiency than the separate strategy, for fixed SER and 
bandwidth efficiency. 

3) Discussion: The above analyzes two extreme cases of 
the weight coefficients. In general, depending on the specific 
weight coefficients, one has the freedom of dividing the K 
nodes into M groups (1 < M < K), and letting each group 
transmit using STAC separately, to achieve a tradeoff between 
the bandwidth efficiency and energy efficiency. 

V. Conclusion 

The wireless DCN differs from general wireless networks in 
that it has large amounts of M20 sessions, which are normally 
followed by further computations at the destinations, with 
weighted summation being the typical case. Recognizing this, 
we have proposed a novel physical layer scheme STAC that 
achieves simultaneous transmissions and computations over 
the air, and an enhanced SDN architecture to enable it. It is 
demonstrated that with STAC used, both the bandwidth and 
energy efficiencies can be significantly improved. 
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